Search CORE

105 research outputs found

Automatic Task Parallelization of Dataflow Graphs in ML/DL models

Author: Das Srinjoy
Rauchwerger Lawrence
Publication venue
Publication date: 22/08/2023
Field of study

Several methods exist today to accelerate Machine Learning(ML) or Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search space optimizations which are costly in terms of power and hardware usage. Especially in the case of inference, when the batch size is 1 and execution is on CPUs or for power-constrained edge devices, current techniques can become costly, complicated or inapplicable. To ameliorate this, we present a Critical-Path-based Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs. Our task parallelization approach further optimizes the structure of graphs via cloning and prunes them via constant propagation and dead-code elimination. Contrary to other work, we generate readable and executable parallel Pytorch+Python code from input ML models in ONNX format via a new tool that we have built called {\bf Ramiel}. This allows us to benefit from other downstream acceleration techniques like intra-op parallelism and potentially pipeline parallelism. Our preliminary results on several ML graphs demonstrate up to 1.9

\times

speedup over serial execution and outperform some of the current mechanisms in both compile and runtimes. Lastly, our methods are lightweight and fast enough so that they can be used effectively for power and resource-constrained devices, while still enabling downstream optimizations

arXiv.org e-Print Archive

The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization

Author: Baghdadi Riyadh
Bastoul Cedric
Cohen Albert
Pouchet Louis-Noel
Rauchwerger Lawrence
Publication venue
Publication date: 01/01/2010
Field of study

Research in automatic parallelization of loop-centric programs started with static analysis, then broadened its arsenal to include dynamic inspection-execution and speculative execution, the best results involving hybrid static-dynamic schemes. Beyond the detection of parallelism in a sequential program, scalable parallelization on many-core processors involves hard and interesting parallelism adaptation and mapping challenges. These challenges include tailoring data locality to the memory hierarchy, structuring independent tasks hierarchically to exploit multiple levels of parallelism, tuning the synchronization grain, balancing the execution load, decoupling the execution into thread-level pipelines, and leveraging heterogeneous hardware with specialized accelerators. The polyhedral framework allows to model, construct and apply very complex loop nest transformations addressing most of the parallelism adaptation and mapping challenges. But apart from hardware-specific, back-end oriented transformations (if-conversion, trace scheduling, value prediction), loop nest optimization has essentially ignored dynamic and speculative techniques. Research in polyhedral compilation recently reached a significant milestone towards the support of dynamic, data-dependent control flow. This opens a large avenue for blending dynamic analyses and speculative techniques with advanced loop nest optimizations. Selecting real-world examples from SPEC benchmarks and numerical kernels, we make a case for the design of synergistic static, dynamic and speculative loop transformation techniques. We also sketch the embedding of dynamic information, including speculative assumptions, in the heart of affine transformation search spaces

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

Author: Akkary H.
Cintra M.
Figueiredo R.
Garzarán M. J.
Gopal S.
Gupta M.
Hammond L.
Josep Torrellas
José María Llabería
Knight T.
Lawrence Rauchwerger
Marcuello P.
María Jesús Garzarán
Milos Prvulovic
Prvulovic M.
Rauchwerger L.
Rundberg P.
Sohi G. S.
Steffan J.
Tremblay M.
Víctor Viñals
Zhang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Automatically Harnessing Sparse Acceleration

Sparse linear algebra is central to many scientific programs, yet compilers fail to optimize it well. High-performance libraries are available, but adoption costs are significant. Moreover, libraries tie programs into vendor-specific software and hardware ecosystems, creating non-portable code. In this paper, we develop a new approach based on our specification Language for implementers of Linear Algebra Computations (LiLAC). Rather than requiring the application developer to (re)write every program for a given library, the burden is shifted to a one-off description by the library implementer. The LiLAC-enabled compiler uses this to insert appropriate library routines without source code changes. LiLAC provides automatic data marshaling, maintaining state between calls and minimizing data transfers. Appropriate places for library insertion are detected in compiler intermediate representation, independent of source languages. We evaluated on large-scale scientific applications written in FORTRAN; standard C/C++ and FORTRAN benchmarks; and C++ graph analytics kernels. Across heterogeneous platforms, applications and data sets we show speedups of 1.1

\times

to over 10

\times

without user intervention.Comment: Accepted to CC 202

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

New insights into the synergism of nucleoside analogs with radiotherapy

Author: A Cohen
A Cohen
A Cohen
A Fyrberg
A Neschadim
A Sandoval
A Saven
A Saven
A Saven
A Zhenchuk
AD Bulgar
AD Seidman
AR Pettitt
AR Pettitt
B Ewald
B Lund
B Pauwels
BD Cheson
Bo Xu
C Nabhan
C Smal
C Smal
C Smal
C Smal
C Xie
C Yang
CE Cass
CF Pollera
CM Galmarini
CM Galmarini
CO Rodriguez Jr
CO Rodriguez Jr
CO Rodriguez Jr
CU Lambe
D Genini
D Genini
D Latz
DA Carson
DR Rauchwerger
DS Shewach
DS Shewach
DS Shewach
E Sabini
ER Giblett
ES Arner
ES Casper
EW Gelfand
FJ Keith
FM Wachters
GP Leung
H Anderson
H Kawasaki
HB Latourette
HM Kantarjian
J Bernier
J Carmichael
J Griffig
J Sigmond
JA Montgomery
JA Montgomery
JA Ubersax
JK Lamba
JK Owens
JR Mackey
JS Ryu
JWG van Putten
K Bhalla
K Fabianowska-Majewska
K Lotfi
K Lotfi
K Ohmine
KL Prus
KM King
L Danhauser
L Li
L Taricani
L Wang
LD Piro
LE Robertson
M Grever
M Iacobini
M Johansson
M Moore
M Nitsche
M Sundaram
MA Stackhouse
Michael W Lee
MJ Cariveau
MJ Pugmire
NA Kocabas
NM Chandler
P Hatzis
P Hentosh
P Hentosh
P Huang
P Huang
P Huang
P Rossolillo
PL Bonate
R Amsailale
RL Capizzi
RP Abratt
RW Brockman
S Hazra
S Hazra
S Nagai
S Seto
T McSorley
T Yamauchi
TA Krenitsky
TS Lawrence
U Consoli
V Gandhi
V Gandhi
V Gregoire
V Gregoire
V Heinemann
V Heinemann
V Verhoef
VI Avramis
VL Damaraju
VM Santana
W Plunkett
WB Parker
WB Parker
WB Parker
William B Parker
Y Saiki
Y Zhang
Z Csapo
ZS Chen
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2013
Field of study

Nucleoside analogs have been frequently used in combination with radiotherapy in the clinical setting, as it has long been understood that inhibition of DNA repair pathways is an important means by which many nucleoside analogs synergize. Recent advances in our understanding of the structure and function of deoxycytidine kinase (dCK), a critical enzyme required for the anti-tumor activity for many nucleoside analogs, have clarified the mechanistic role this kinase plays in chemo- and radio-sensitization. A heretofore unrecognized role of dCK in the DNA damage response and cell cycle machinery has helped explain the synergistic effect of these agents with radiotherapy. Since most currently employed nucleoside analogs are primarily activated by dCK, these findings lend fresh impetus to efforts focused on profiling and modulating dCK expression and activity in tumors. In this review we will briefly review the pharmacology and biochemistry of the major nucleoside analogs in clinical use that are activated by dCK. This will be followed by discussions of recent advances in our understanding of dCK activation via post-translational modifications in response to radiation and current strategies aimed at enhancing this activity in cancer cells

Crossref

Springer - Publisher Connector

PubMed Central

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Run-Time Parallelization: A Framework For Parallel Computation

Author: Lawrence Rauchwerger
Lawrence Rauchwerger
Publication venue
Publication date
Field of study

The goal of parallelizing, or restructuring, compilers is to detect and exploit parallelism in sequential programs written in conventional languages. Current parallelizing compilers do a reasonable job of extracting parallelism from programs with regular, statically analyzable access patterns. However, if the memory access pattern of the program is input data dependent, then static data dependence analysis and consequently parallelization is impossible. Moreover, in this case the compiler cannot apply privatization and reduction parallelization, the transformations that have been proven to be the most effective in removing data dependences and increasing the amount of exploitable parallelism in the program. Typical examples of irregular, dynamic applications are complex simulations such as SPICE for circuit simulation, DYNA-3D for structural mechanics modeling, DMOL for quantum mechanical simulation of molecules, and CHARMM for molecular dynamics simulation of organic systems. Therefor..

CiteSeerX

Run-Time Parallelization: It’s Time Has Come

Author: Lawrence Rauchwerger
Publication venue
Publication date: 01/01/1998
Field of study

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. This type of loop mostly occurs in irregular, dynamic applications which represent more than 50 % of all applications [20]. Making parallel computing succeed has therefore become conditioned by the ability of compilers to analyze and extract the parallelism from irregular applications. In this paper we present a survey of techniques that can complement the current compiler capabilities by performing some form of data dependence analysis during program execution, when all information is available. After describing the problem of loop parallelization and its difficulties, a general overview of the need for techniques of run-time parallelization is given. A survey of the various approaches to parallelizing partially parallel loops and fully parallel loops is presented. Special emphasis is placed on two parallelism enabling transformations, privatization and reduction parallelization, because of their proven efficiency. The technique of speculatively parallelizing doall loops is presented in more detail. This survey limits itself to the domain of Fortran applications parallelized mostly in the shared memory paradigm. Related work from the field of parallel debugging and parallel simulation is also described

CiteSeerX

Languages and compilers for parallel computing: 30th international workshop, LCPC 2017, College Station, TX, USA, October 11-13, 2017, revised selected papers

Author: Rauchwerger Lawrence
Publication venue: Springer International Publishing AG
Publication date: 01/01/2019
Field of study

CERN Document Server